Assume we are given SNP matrix $G \in \mathbb{R}^{N,D}$. Standardizing it, will set the mean $\mu$ to zero and the variance to one for each SNP $j$.
The sample variance for SNP $j$ ($\text{var}_j$) is defined as: $$ \text{var}_j = \frac{1}{N} \sum_{i=1}^N (G_{ij} - \mu)^2 = \frac{1}{N} \sum_{i=1}^N G_{ij}^2 = 1 $$
Thus, when computing the sum of squared entries (as in the "new" normalization scheme), we get:
$$ ss = \sum_{i=1}^N \sum_{j=1}^D G_{ij}^2 = \sum_{j=1}^D N \cdot \text{var}_j = N \sum_{j=1}^D 1 = N \cdot D $$Thus, normalizing $G$ by $\sqrt{\frac{ss}{N}}$ is equivalent to normalizing by $\sqrt{D}$ if $G$ was unit standardized.
In [28]:
import numpy as np
from pysnptools.standardizer.diag_K_to_N import DiagKtoN
from pysnptools.standardizer import Unit
N = 10
D = 100
np.random.seed(42)
m = np.random.random((N,D))
mu = Unit().standardize(m.copy())
# get factor
d2 = np.sum(mu**2) / float(N)
print "factor:", d2, "== D"
s = DiagKtoN(N)
s.standardize(m)
K = m.dot(m.T)
sum_diag = np.sum(np.diag(K))
print "sum of diagonal", sum_diag
In [29]:
# this may not hold true for other standardizers (e.g. beta)...
import numpy as np
from pysnptools.standardizer import Beta
N = 10
D = 100
np.random.seed(42)
m = np.random.random((N,D))
mu = Beta().standardize(m.copy())
# get factor
d2 = np.sum(mu**2) / float(N)
print "factor: ", d2, "!= D"
In [ ]: